EntityBases: Compiling, Organizing and Querying Massive Entity Repositories

نویسندگان

  • Craig A. Knoblock
  • José Luis Ambite
  • Kavita Ganesan
  • Maria Muslea
  • Steven Minton
  • Greg Barish
  • Evan Gamble
  • Claude Nanjo
  • Kane See
  • Cyrus Shahabi
  • Ching-Chien Chen
چکیده

The current approaches for linking information across sources, often called record linkage, require finding common attributes between the sources and comparing the records using those attributes. This often leads to unsatisfactory results because the sources are often missing information or contain incorrect or outdated information. We are addressing this problem by developing the technology to build massive entity knowledgebases, which we call EntityBases. The key idea is to create a comprehensive knowledgebase for the entities of interest (e.g., companies). In order to build such a knowledge base, we must address the issues of linking entities with multi-valued attributes obtained from heterogeneous sources and providing a virtual repository that can be efficiently queried. This paper describes how we have addressed these issues and shows how an EntityBaseTM can be used for understanding and linking text documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XQuery Evaluation with Relevance Ranking in Structured Peer-to-Peer Systems

This paper addresses the problem of publishing, indexing, and querying large XML data repositories distributed over an existing peer-to-peer (P2P) service infrastructure. Our architecture scales gracefully to the network and data sizes by supporting thousands of nodes, massive data, and frequent queries and updates. It is fully distributed, fault tolerant and self-organizing, and handles comple...

متن کامل

Analysis and design of approximate queries over XML documents using statistical techniques

In the last few years several repositories for storing XML documents and languages for querying XML data have been studied and implemented. All the query languages proposed so far allow to obtain exact answers, but when applied to large XML repositories or warehouses, such precise queries may require high response times. To overcome this problem, in traditional relational warehouses fast approx...

متن کامل

Query-By-Keywords (QBK): Query Formulation Using Semantics and Feedback

The staples of information retrieval have been querying and search, respectively, for structured and unstructured repositories. Processing queries over known, structured repositories (e.g., Databases) has been well-understood, and search has become ubiquitous when it comes to unstructured repositories (e.g., Web). Furthermore, searching structured repositories has been explored to a limited ext...

متن کامل

DBGlobe: A Data-Centric Approach to Global Computing

In the near future, there will be increasingly powerful computers in smart cards, telephones, and other information appliances. This will create a massive infrastructure composed of highly diverse interconnected mobile entities. In this paper, we present a data-centric approach to storage and querying in such environments. At a first level, we view each entity as a miniature database; at a seco...

متن کامل

A Native Extensible XML Query Processor Towards Efficient and Effective MPEG-7 Querying

In recent years the production of massive amounts of visual information has led to the arrival of very large multimedia Digital Libraries (DLs). The key to support efficient search and management operations in such repositories is to exploit metadata information for digital media, such as MPEG7 [4] based ones, which seem to be the most widely accepted. The underlying XML syntax, together with t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007